32 research outputs found

    Multivariate Covariance Generalized Linear Models

    Full text link
    We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated measures and longitudinal structures, and the third involves a spatio-temporal analysis of rainfall data. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures.Comment: 21 pages, 5 figure

    Hypothesis tests for multiple responses regression models in R: The htmcglm Package

    Full text link
    This article describes the R package htmcglm implemented for performing hypothesis tests on regression and dispersion parameters of multivariate covariance generalized linear models (McGLMs). McGLMs provide a general statistical modeling framework for normal and non-normal multivariate data analysis along with a wide range of correlation structures. The proposed package considers the Wald statistics to perform general hypothesis tests and build tailored ANOVAs, MANOVAs and multiple comparison tests. The goal of the package is to provide tools to improve the interpretation of regression and dispersion parameters. We assess the effects of the covariates on the response variables by testing the regression coefficients. Similarly, we perform tests on the dispersion coefficients in order to assess the correlation between study units. It could be of interest in situations where the data observations are correlated with each other, such as in longitudinal, times series, spatial and repeated measures studies. The htmcglm package provides a user friendly interface to perform MANOVA like tests as well as multivariate hypothesis tests for models of the mcglm class. We describe the package implementation and illustrate it through the analysis of two data sets. The first deals with an experiment on soybean yield; the problem has three response variables of different types (continuous, counting and binomial) and three explanatory variables (amount of water, fertilization and block). The second dataset addresses a problem where responses are longitudinal bivariate counts of hunting animals; the explanatory variables used are the hunting method and sex of the animal. With these examples we were able to illustrate several tests in which the proposal proves to be useful for the evaluation of regression and dispersion parameters both in problems with dependent or independent observations.Comment: arXiv admin note: substantial text overlap with arXiv:2208.0002

    Adaptação transcultural e validação do instrumento Conditions of Work Effectiveness - Questionnaire-II

    Get PDF
    OBJECTIVE: This study aims at translating and validating the content of the instrument Conditions of Work Effectiveness - Questionnaire-II (CWEQ-II), developed by Laschinger, Finegan, Shamian and Wilk, modified from the original CWEQ for the Brazilian culture. METHOD: the methodological procedure consisted of the stages of translation of the instrument into the Portuguese language; back-translation; semantic, idiomatic and cultural equivalence and tests of the final version. The instrument in the Portuguese version was applied to a group of 40 nurses in two hospitals. RESULTS: the data resulted in a Cronbach's Alpha of 0.86 for the first hospital and 0.88 for the second one. The results of the factorial analysis are considered sufficiently satisfactory. CONCLUSION: It is to conclude that the instrument can be used in Brazil.OBJETIVO: este estudio tuvo como objetivo traducir y validar el contenido del instrumento Conditions of Work Effectiveness - Questionnaire-II (CWEQ-II), desarrollado por Laschinger, Finegan, Shamian y Wilk y modificado del original CWEQ, para la cultura brasileña. MÉTODO: el procedimiento metodológico se constituye de las etapas de traducción del instrumento para la lengua portuguesa; back-translation; equivalencia semántica, idiomática y cultural y pruebas de la versión final. El instrumento en la versión en portugués fue aplicado a un grupo de 40 enfermeras, en dos hospitales. RESULTADOS: los datos resultaron en Alfa de Cronbach en 0,86 para el primer hospital y 0,88 para el segundo. Los resultados del análisis de los factores son considerados muy satisfactorios. CONCLUSIÓN: se concluye que el instrumento puede ser utilizado en Brasil.OBJETIVO: este estudo teve como objetivo traduzir e validar o conteúdo do instrumento Conditions of Work Effectiveness - Questionnaire-II, desenvolvido por Laschinger, Finegan, Shamian e Wilk e modificado do original Conditions Work Effectiveness - Questionnaire, para a cultura brasileira. MÉTODO: o procedimento metodológico constituiu-se das etapas de tradução do instrumento para a língua portuguesa; back-translation; equivalência semântica, idiomática e cultural e testes da versão final. O instrumento na versão em português foi aplicado a um grupo de 40 enfermeiras, em dois hospitais. RESULTADOS: os dados resultaram em alfa de Cronbach em 0,86 para o primeiro hospital e 0,88 para o segundo. Os resultados da análise fatorial são considerados bastante satisfatórios. CONCLUSÃO: conclui-se que o instrumento pode ser utilizado no Brasil

    Ciência de Dados: uma descrição dos primeiros cursos de graduação em universidades brasileiras

    Get PDF
    Due to the increasing volume of data, the urgency to look for suitably qualified data scientists has grown. Thus, Brazilian Higher Education Institutions (HEIs) have tried to answer this demand. In this scenario, the objective of this paper is to perform a characterization of undergraduate courses in Data Science. Thus, we aim to answer questions such as: have the courses been offered in the vast majority by public or private universities? When did they start being offered? Are they usually ODL (Online Distance Learning or in-person? Are they the technological type or baccalaureate? What groups of disciplines most make up the curriculum? In which regions of the country are they concentrated? How is the offer of vacancies and what is the profile of admissions in bachelor and technological courses? For this, the e-MEC databases and the 2021 Higher Education Census were combined, and it was decided to explore and visualize data using the MCA technique. Among the results, it is observed that there is a certain balance between the in-person and online learning modalities, in addition to the fact that most of the courses are of the technological type and are usually offered by private HEIs. Regarding the regions, a significant number of in-person undergraduate courses are concentrated in the Southeast region of Brazil.Devido ao aumento de volume de dados, a urgência na busca de cientistas de dados devidamente qualificados têm crescido. Desta forma, as Instituições de Ensino Superior (IES) brasileiras têm buscado suprir tal demanda. Neste enredo, o objetivo deste artigo é realizar uma caracterização dos cursos de graduação em Ciência de Dados. Assim, buscou-se responder questionamentos como: os cursos têm sido ofertados em grande maioria pelas universidades públicas ou privadas? Quando começaram a ser ofertados? Costumam ser EAD (ensino à distância) ou presenciais? São do tipo tecnológico ou bacharelado? Quais grupos de disciplinas mais compõem a grade? Em quais regiões do país se concentram? Como é a oferta de vagas e qual é o perfil de ingressos em cursos do tipo bacharelado e tecnológico? Para isso, utilizou-se a junção das bases do e-MEC e do Censo da Educação Superior de 2021 e optou-se por fazer a exploração e visualização de dados considerando a técnica ACM. Entre os resultados, observa-se que há um certo equilíbrio entre as modalidades presencial e EAD, além de que em grande parte os cursos são do tipo tecnológico e costumam ser ofertados por IES privadas. Acerca das regiões, nota-se uma grande concentração de cursos presenciais na região Sudeste do Brasil

    MODELAGEM MARGINAL CONJUNTA DA ALTURA E VOLUME PARA Araucaria angustifolia

    Get PDF
    Variáveis mensuradas em florestas normalmente apresentam algum grau de correlação. Logo, ajustar modelos para estimar variáveis biométricas de forma independente não é a abordagem mais adequada. Assim, modelos multivariados ganham relevância devido à capacidade de quantificar associações entre variáveis respostas. Nesse contexto, o objetivo da presente pesquisa foi ajustar modelos lineares generalizados de covariância multivariada (MCGLMs) univariados e multivariados para estimar altura e volume de árvores. As variáveis altura ( ), volume ( ) e diâmetro ( ) foram coletadas da Araucaria angustifolia em floresta nativa, localizada no estado de Santa Catarina, Brasil. Os MCGLMs foram ajustados para estimar  e , em abordagem univariada e multivariada. O preditor linear dos modelos foi fixado previamente em função da covariável , para ambas as variáveis. Devido a um aparente padrão de variância não constante das duas respostas, diferentes estruturas do preditor linear matricial foram testadas, com efeito da covariável  variando até polinômio de grau três. Ainda, um parâmetro de potência foi estimado nas duas abordagens, com a finalidade de obter uma função de variância para cada variável. Os parâmetros estimados nas abordagens univariadas e multivariadas foram similares. Em geral, o erro padrão dos parâmetros foi menor para os modelos multivariados, sendo consequência da correlação entre as variáveis respostas. Os resultados também sugeriram que uma função de variância Poisson-Gama composta é adequada para variável , bem como uma função constante para variável . O modelo mais adequado foi obtido com preditor linear matricial somente em função de um parâmetro de dispersão associado a uma matriz identidade

    Multivariate Generalized Linear Mixed Models for Count Data

    Get PDF
    Univariate regression models have rich literature for counting data. However, this is not the case for multivariate count data. Therefore, we present the Multivariate Generalized Linear Mixed Models framework that deals with a multivariate set of responses, measuring the correlation between them through random effects that follows a multivariate normal distribution. This model is based on a GLMM with a random intercept and the estimation process remains the same as a standard GLMM with random effects integrated out via Laplace approximation. We efficiently implemented this model through the TMB package available in R. We used Poisson, negative binomial (NB), and COM-Poisson distributions. To assess the estimator properties, we conducted a simulation study considering four different sample sizes and three different correlation values for each distribution. We achieved unbiased and consistent estimators for Poisson and NB distributions; for COM-Poisson estimators were consistent, but biased, especially for dispersion, variance, and correlation parameter estimators. These models were applied to two datasets. The first concerns a sample from 30 different sites collected in Australia where the number of times each one of the 41 different ant species was registered; which results in an impressive 820 variance-covariance and 41 dispersion parameters are estimated simultaneously, let alone the regression parameters. The second is from the Australia Health Survey with 5 response variables and 5190 respondents. These datasets can be considered overdispersed by the generalized dispersion index. The COM-Poisson model overcame the other two competitors considering three goodness-of-fit indexes, AIC, BIC, and maximized log-likelihood values. As a result, it estimated parameters with smaller standard errors and a greater number of significant correlation coefficients. Therefore, the proposed model is capable of dealing with multivariate count data, either under- equi- or overdispersed responses, and measuring any kind of correlation between them taking into account the effects of the covariates
    corecore